1 Introduction

This HTML notebook will replicate the STATA do file named vnm_mics14_dp2019 for distribution in R. The goal is to create an R script that does what the said do file does.

I will follow the same section numbering as in the do file for ease of comparision.

2 Libraries

Before we go into the excercise, following are the packages from which we will use various functions.

## here() starts at D:/R projects/OPHI/Vietnam Translation/Vietnam-MICS-MPI-to-R
## -- Attaching packages ------------
## v ggplot2 3.3.0     v purrr   0.3.4
## v tibble  3.0.1     v dplyr   0.8.5
## v tidyr   1.0.3     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## Warning: package 'ggplot2' was built under R version 3.6.3
## Warning: package 'tibble' was built under R version 3.6.3
## Warning: package 'purrr' was built under R version 3.6.3
## Warning: package 'dplyr' was built under R version 3.6.3
## Warning: package 'forcats' was built under R version 3.6.3
## -- Conflicts ---------------------
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## Warning: package 'janitor' was built under R version 3.6.3
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
## Warning: package 'gt' was built under R version 3.6.3

3 Building a Well-Being Index from Vietnam MICS 2013-14

The memory and environment clearing commands and the commands for setting working folders and paths are not needed if one is working in Project mode through RStudio and uses the {here} (read as package here). This is a library for managing paths and directories.

4 Vietnam MICS 2014

4.1 Step 1: Data Preparation

__ Selecting main variables from CH, WM, HH & MN recode & merging with HL recode __

It should be noted that anthropometric data was not collected for children under 5 as part of the Viet Nam MICS 2014 dataset. Previously, nutrition data was collected as part of Viet Nam MICS 2011. However, the data was not collected in this round due to time and resource constraints as well as the availability of national nutrition survey data (p.61)

Above comments are copied from the STATA do file

4.1.1 Step 1.1: CH - Children’s Recode (Under 5)

According to the STATA do file there is no data for this section.

4.1.2 Step 1.2: BH - Birth Recode (All females 15-49 years who ever gave birth)

The purpose of step 1.2 is to identify children of any age who died in the last 5 years prior to the survey date. As seen in the STATA file

Loading the data from bh.sav.

The above code chunk loads the bh.sav and names it bh_dat(a data object). The clean names function gets all the variable names in lower snake case. IN case if anyone is wondering what are all the possible cases, please refere to the wonderful art by Allison Horst shown below.

Various Cases (Let me know your favourite)

Various Cases (Let me know your favourite)

Now let us take a glimplse at the data and names of the variables.

##  [1] "hh1"       "hh2"       "ln"        "bhln"      "bh2"       "bh3"      
##  [7] "bh4m"      "bh4y"      "bh5"       "bh6"       "bh7"       "bh8"      
## [13] "bh9u"      "bh9n"      "bh10"      "bh4c"      "bh4f"      "bh9c"     
## [19] "bh9f"      "hh6"       "hh7"       "wdoi"      "wdob"      "ethnicity"
## [25] "welevel"   "brthord"   "magebrt"   "birthint"  "wmweight"  "wscore"   
## [31] "windex5"   "wscoreu"   "windex5u"  "wscorer"   "windex5r"  "windex2"
hh1 hh2 ln bhln bh2 bh3 bh4m bh4y bh5 bh6 bh7 bh8 bh9u bh9n bh10 bh4c bh4f bh9c bh9f hh6 hh7 wdoi wdob ethnicity welevel brthord magebrt birthint wmweight wscore windex5 wscoreu windex5u wscorer windex5r windex2
1 2 2 1 1 1 4 2011 1 2 1 4 NA NA NA 1336 1 NA NA 1 1 1368 1044 1 4 1 2 1 2.167035 1.640778 5 1.337577 5 NA NA 2
1 2 2 2 1 1 5 2013 1 0 1 5 NA NA 2 1361 1 NA NA 1 1 1368 1044 1 4 2 2 2 2.167035 1.640778 5 1.337577 5 NA NA 2
1 3 2 1 1 2 3 2003 1 10 1 4 NA NA NA 1239 1 NA NA 1 1 1368 939 1 3 1 2 1 2.167035 1.427797 5 1.071892 5 NA NA 2
1 3 2 2 1 1 6 2007 1 6 1 5 NA NA 2 1290 1 NA NA 1 1 1368 939 1 3 2 2 4 2.167035 1.427797 5 1.071892 5 NA NA 2
1 4 2 1 1 2 11 2007 1 6 1 4 NA NA NA 1295 1 NA NA 1 1 1368 1008 1 4 1 2 1 2.167035 1.613568 5 1.303632 5 NA NA 2
1 6 2 1 1 1 3 1995 1 18 1 3 NA NA NA 1143 1 NA NA 1 1 1368 788 1 4 1 2 1 2.167035 1.624926 5 1.317802 5 NA NA 2

Now let us create variables, only those that are non grouped, i.e. thhose generated without using bysort. In R one can create grouped varibales in a different manner. Will show that in the code chunk following this one.

Creating Tables for new variables check.

age_death n percent valid_percent
0 242 1.563408e-02 0.455743879
1 33 2.131921e-03 0.062146893
2 17 1.098262e-03 0.032015066
3 17 1.098262e-03 0.032015066
4 9 5.814329e-04 0.016949153
5 5 3.230183e-04 0.009416196
6 14 9.044512e-04 0.026365348
7 6 3.876219e-04 0.011299435
8 10 6.460366e-04 0.018832392
9 2 1.292073e-04 0.003766478
11 5 3.230183e-04 0.009416196
12 23 1.485884e-03 0.043314501
14 5 3.230183e-04 0.009416196
18 3 1.938110e-04 0.005649718
19 1 6.460366e-05 0.001883239
20 2 1.292073e-04 0.003766478
23 4 2.584146e-04 0.007532957
24 22 1.421280e-03 0.041431262
36 27 1.744299e-03 0.050847458
48 14 9.044512e-04 0.026365348
60 6 3.876219e-04 0.011299435
72 9 5.814329e-04 0.016949153
84 3 1.938110e-04 0.005649718
96 5 3.230183e-04 0.009416196
108 6 3.876219e-04 0.011299435
120 1 6.460366e-05 0.001883239
132 1 6.460366e-05 0.001883239
144 3 1.938110e-04 0.005649718
156 1 6.460366e-05 0.001883239
168 3 1.938110e-04 0.005649718
180 1 6.460366e-05 0.001883239
192 7 4.522256e-04 0.013182674
204 3 1.938110e-04 0.005649718
216 7 4.522256e-04 0.013182674
228 4 2.584146e-04 0.007532957
240 3 1.938110e-04 0.005649718
252 1 6.460366e-05 0.001883239
264 2 1.292073e-04 0.003766478
276 3 1.938110e-04 0.005649718
312 1 6.460366e-05 0.001883239
NA 14948 9.656955e-01 NA
## Warning: Column `bh5` has different attributes on LHS and RHS of join
bh5 child is alive child is dead
1 14948 0
2 0 531
child18_died n percent
child is alive 15465 0.9990955488
child is dead 14 0.0009044512

Creating grouped variables.

Notice that I need not write equivalent R code for the following STATA code chunks:

1> This has been included in the grouped condition for the variable

replace tot_child18_died_5y=. if child18_died==1 & ydead_survey==. //Replace as ‘.’ if there is no information on when the child died

2> This need not be written as the structure of data is different, only one rwo exists per ind_id NOte that due to this feature there is no need to create “child18u_died_per_wom_5y” as it is same as “tot_child18_died_5y”

bysort ind_id: egen childu18_died_per_wom_5y = max(tot_child18_died_5y) lab var childu18_died_per_wom_5y “Total child under 18 death for each women in the last 5 years (birth recode)”

3> This need not be written as the structure of data is different, only one rwo exists per ind_id

//Keep one observation per women bysort ind_id: gen id=1 if _n==1 keep if id==1 drop id duplicates report ind_id

Creating variables for new grouped variables

##  tot_child_died    0 1
##               0 6647 0
##               1  382 7
##               2   55 0
##               3    6 2
##               4    2 0

Writing new files.

4.1.3 Step 1.3: WM - Women’s Record (All eligible females 15-49 in the household)

Loading data and setting case for variable (column) names.

Mutating the data for creating new variables. This is what one would refere to generating new varibles in STATA, I think.

Note, for the varible marital or marital_wom, I miss to see whyn the STATA code defines it as marital to begin with and renames it to marital_wom. I will define this varibale as marital_wom from the begining itself. Kindly bring to notice if this is incorrect.

While creating new variables the STATA code has various tables. I will generate those here.

## [1] TRUE

This means all Individual Ids are unique, i.e., every row represents a unique individual.

cm1 1 2 NA_
1 454 6647 0
2 0 2726 0
NA 0 0 363
## # A tibble: 10 x 2
##                           mstatus       ma6
##                         <dbl+lbl> <dbl+lbl>
##  1 1 [Currently married/in union]        NA
##  2 1 [Currently married/in union]        NA
##  3 1 [Currently married/in union]        NA
##  4 3 [Never married/in union]            NA
##  5 3 [Never married/in union]            NA
##  6 3 [Never married/in union]            NA
##  7 3 [Never married/in union]            NA
##  8 1 [Currently married/in union]        NA
##  9 1 [Currently married/in union]        NA
## 10 3 [Never married/in union]            NA
mstatus 1 2 3 NA_
1 0 0 0 6972
2 207 193 107 0
3 0 0 0 2348
NA 0 0 0 363
marital_wom n percent valid_percent
Currently Married 6972 0.68420020 0.70947390
Never Married 2348 0.23042198 0.23893355
Widowed 207 0.02031403 0.02106441
Divorced 193 0.01894014 0.01963977
Seperated/Not living together 107 0.01050049 0.01088837
NA 363 0.03562316 NA
ma6 Currently Married Never Married Widowed Divorced Seperated/Not living together NA_
1 0 0 207 0 0 0
2 0 0 0 193 0 0
3 0 0 0 0 107 0
NA 6972 2348 0 0 0 363
mstatus Currently Married Never Married Widowed Divorced Seperated/Not living together NA_
1 6972 0 0 0 0 0
2 0 0 207 193 107 0
3 0 2348 0 0 0 0
NA 0 0 0 0 0 363

Keeping relevant columns.

Writing the tables in csv.

4.1.4 Step 1.4: MR - MEn’s Recode (All eligible men in the household)

As mentioned in the STATA code, Note: There is no male recode file for Viet Nam MICS 2014. Hence the commands under this section have been remove

4.1.7 Step 1.7: Data Merging

Unlike the STATA code the merging here will look a little different. It will be in a single flow. However, checks will be shown prior to mrunnig the merge command.

##  hl7     n      percent
##    0 29190 0.7412392077
##    1  1065 0.0270441849
##    2  4650 0.1180802438
##    3  2037 0.0517267649
##    4  1479 0.0375571356
##    5   541 0.0137379380
##    6   228 0.0057897410
##    7   110 0.0027932961
##    8    38 0.0009649568
##    9    27 0.0006856272
##   10     9 0.0002285424
##   11     3 0.0000761808
##   13     2 0.0000507872
##   15     1 0.0000253936
##  women_wm FALSE  TRUE
##         1     0 10190
##        NA 29190     0
## [1] wm7     n       percent
## <0 rows> (or 0-length row.names)

Writing this object for safe keeping.